Semi-Supervised Learning (SSL) has recently accomplished successful achievements in various fields such as image classification, object detection, and semantic segmentation, which typically require a lot of labour to construct ground-truth. Especially in the depth estimation task, annotating training data is very costly and time-consuming, and thus recent SSL regime seems an attractive solution. In this paper, for the first time, we introduce a novel framework for semi-supervised learning of monocular depth estimation networks, using consistency regularization to mitigate the reliance on large ground-truth depth data. We propose a novel data augmentation approach, called K-way disjoint masking, which allows the network for learning how to reconstruct invisible regions so that the model not only becomes robust to perturbations but also generates globally consistent output depth maps. Experiments on the KITTI and NYU-Depth-v2 datasets demonstrate the effectiveness of each component in our pipeline, robustness to the use of fewer and fewer annotated images, and superior results compared to other state-of-the-art, semi-supervised methods for monocular depth estimation. Our code is available at https://github.com/KU-CVLAB/MaskingDepth.
translated by 谷歌翻译
We present a novel depth completion approach agnostic to the sparsity of depth points, that is very likely to vary in many practical applications. State-of-the-art approaches yield accurate results only when processing a specific density and distribution of input points, i.e. the one observed during training, narrowing their deployment in real use cases. On the contrary, our solution is robust to uneven distributions and extremely low densities never witnessed during training. Experimental results on standard indoor and outdoor benchmarks highlight the robustness of our framework, achieving accuracy comparable to state-of-the-art methods when tested with density and distribution equal to the training one while being much more accurate in the other cases. Our pretrained models and further material are available in our project page.
translated by 谷歌翻译
In this paper, we propose the first-ever real benchmark thought for evaluating Neural Radiance Fields (NeRFs) and, in general, Neural Rendering (NR) frameworks. We design and implement an effective pipeline for scanning real objects in quantity and effortlessly. Our scan station is built with less than 500$ hardware budget and can collect roughly 4000 images of a scanned object in just 5 minutes. Such a platform is used to build ScanNeRF, a dataset characterized by several train/val/test splits aimed at benchmarking the performance of modern NeRF methods under different conditions. Accordingly, we evaluate three cutting-edge NeRF variants on it to highlight their strengths and weaknesses. The dataset is available on our project page, together with an online benchmark to foster the development of better and better NeRFs.
translated by 谷歌翻译
我们提出了X-NERF,这是一种基于神经辐射场公式,从具有不同光谱敏感性的相机捕获的跨光谱场景表示的新颖方法,给出了从具有不同光谱灵敏度的相机捕获的图像。X-NERF在训练过程中优化了整个光谱的相机姿势,并利用归一化的跨设备坐标(NXDC)从任意观点呈现不同模态的图像,这些观点是对齐的,并以相同的分辨率对齐。在16个前面的场景上进行的实验,具有颜色,多光谱和红外图像,证实了X-NERF在建模跨光谱场景表示方面的有效性。
translated by 谷歌翻译
自我监督的单眼深度估计是一种有吸引力的解决方案,不需要难以供应的深度标签进行训练。卷积神经网络(CNN)最近在这项任务中取得了巨大成功。但是,他们的受欢迎的领域有限地限制了现有的网络体系结构,以便在本地进行推理,从而抑制了自我监督范式的有效性。鉴于Vision Transformers(VIT)最近取得的成功,我们提出了Monovit,这是一个崭新的框架,结合了VIT模型支持的全球推理以及自我监督的单眼深度估计的灵活性。通过将普通的卷积与变压器块相结合,我们的模型可以在本地和全球范围内推理,从而在较高的细节和准确性上产生深度预测,从而使MonoVit可以在已建立的Kitti数据集中实现最先进的性能。此外,Monovit证明了其在其他数据集(例如Make3D和Drivingstereo)上的出色概括能力。
translated by 谷歌翻译
无监督的域适应性(UDA)旨在减少训练和测试数据之间的域间隙,并在大多数情况下以离线方式进行。但是,在部署过程中可能会连续且不可预测地发生域的变化(例如,天气变化突然变化)。在这种情况下,深度神经网络见证了准确性的急剧下降,离线适应可能不足以对比。在本文中,我们解决了在线域适应(ONDA)进行语义细分。我们设计了一条可逐步或突然转移的域转移的管道,在多雨和有雾的情况下,我们对其进行了评估。我们的实验表明,我们的框架可以有效地适应部署期间的新域,而不受灾难性遗忘以前的域的影响。
translated by 谷歌翻译
我们通过求解立体声匹配对应关系来解决注册同步颜色(RGB)和多光谱(MS)图像的问题。目的是,我们引入了一个新颖的RGB-MS数据集,在室内环境中框架13个不同的场景,并提供了34个图像对,并以差距图的形式带有半密度的高分辨率高分辨率地面标签。为了解决这项任务,我们提出了一个深度学习架构,通过利用进一步的RGB摄像机来以自我监督的方式进行培训,这仅在培训数据获取过程中需要。在此设置中,我们可以通过将知识从更轻松的RGB-RGB匹配任务中提炼出基于大约11K未标记的图像三重列表的集合来使知识从更轻松的RGB-RGB匹配任务中提取知识,从而方便地学习跨模式匹配。实验表明,提议的管道为这项小说,具有挑战性的任务进行了未来的研究,为未来的研究设定了良好的性能栏(1.16像素的平均注册错误)。
translated by 谷歌翻译
我们提出了一个新颖的高分辨率和具有挑战性的立体声数据集框架室内场景,并以致密而准确的地面真相差异注释。我们数据集的特殊是存在几个镜面和透明表面的存在,即最先进的立体声网络失败的主要原因。我们的采集管道利用了一个新颖的深度时空立体声框架,该框架可以轻松准确地使用子像素精度进行标记。我们总共发布了419个样本,这些样本在64个不同的场景中收集,并以致密的地面差异注释。每个样本包括高分辨率对(12 MPX)以及一个不平衡对(左:12 MPX,右:1.1 MPX)。此外,我们提供手动注释的材料分割面具和15K未标记的样品。我们根据我们的数据集评估了最新的深层网络,强调了它们在解决立体声方面的开放挑战方面的局限性,并绘制了未来研究的提示。
translated by 谷歌翻译
近年来,对基于深度学习的粉丝彭化的兴趣日益增长。研究主要集中在建筑上。然而,缺乏基础事实,模型培训也是一个主要问题。一种流行的方法是使用原始数据作为地面真理训练在降低的分辨率域中的网络。然后在全分辨率数据上使用训练有素的网络,依赖于隐式缩放不变性假设。结果通常良好的分辨率,但在全分辨率下更具可疑的问题。在这里,我们向基于深度学习的泛散歌提出了一个全分辨率的培训框架。训练在高分辨率域中进行,仅依赖于原始数据,没有信息丢失。为了确保光谱和空间保真度,定义了合适的损耗,该损耗迫使泛圆柱输出与可用的全谱和多光谱输入一致。在WorldView-3,WorldView-2和Geoeye-1图像上进行的实验表明,在拟议的框架培训的方法中,在全分辨率数值指标和视觉质量方面都能保证出色的性能。该框架完全是一般的,可用于培训和微调任何基于深度学习的泛狼平网络。
translated by 谷歌翻译
Computational units in artificial neural networks follow a simplified model of biological neurons. In the biological model, the output signal of a neuron runs down the axon, splits following the many branches at its end, and passes identically to all the downward neurons of the network. Each of the downward neurons will use their copy of this signal as one of many inputs dendrites, integrate them all and fire an output, if above some threshold. In the artificial neural network, this translates to the fact that the nonlinear filtering of the signal is performed in the upward neuron, meaning that in practice the same activation is shared between all the downward neurons that use that signal as their input. Dendrites thus play a passive role. We propose a slightly more complex model for the biological neuron, where dendrites play an active role: the activation in the output of the upward neuron becomes optional, and instead the signals going through each dendrite undergo independent nonlinear filterings, before the linear combination. We implement this new model into a ReLU computational unit and discuss its biological plausibility. We compare this new computational unit with the standard one and describe it from a geometrical point of view. We provide a Keras implementation of this unit into fully connected and convolutional layers and estimate their FLOPs and weights change. We then use these layers in ResNet architectures on CIFAR-10, CIFAR-100, Imagenette, and Imagewoof, obtaining performance improvements over standard ResNets up to 1.73%. Finally, we prove a universal representation theorem for continuous functions on compact sets and show that this new unit has more representational power than its standard counterpart.
translated by 谷歌翻译